People don't want data - what they really want is insight. Or even better, actionable insight. Now the road from data to insights can be a bit of a beast. Take Airbnb as an example - it started as a scrappy social hack and grew into a large and data-driven company. When they were small so was their data, but as the company and technical architecture grew in scale and complexity leveraging that data became a challenge. It became more and more necessary to combine multiple messy data-sources in novel ways, in the right order and on a strict schedule... using distributed computing... with proper logging and error recovery... gosh. Batch jobs, cron, sticky tape and bits of string soon proved insufficient.
Airflow is an Apache top-level project that was open-sourced by Airbnb. It's a seriously powerful tool that's all about defining, scheduling, running, monitoring and distributing complicated workflows.
In this talk I'll give you a bit of a tour of airflow's moving parts. I'll also talk a little bit about how we are leveraging Airflow at Umuzi